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Abstract 

Many programmers, when they encounter an error, would like to have 
the benefit of automatic fix suggestions — as long as they are, most of the 
time, adequate. Initial research in this direction has generally limited itself 
to specific areas, such as data structure classes with carefully designed 
interfaces, and relied on simple approaches. 

To provide high-quality fix suggestions in a broad area of applicability, 
the present work relies on the presence of contracts in the code, and on 
the availability of static and dynamic analyses to gather evidence on the 
values taken by expressions derived from the code. 

The ideas have been built into the AutoFix-E2 automatic fix gener- 
ator. Applications of AutoFix-E2 to general-purpose software, such as 
a library to manipulate documents, show that the approach provides an 
improvement over previous techniques, in particular purely model-based 
approaches. 

1 Introduction 

Debugging — the activity of finding and correcting errors in programs — is so 
everyday in every programmer's job that any improvement at automating even 
parts of it has the potential for a significant impact on productivity and software 
quality. 

While automation remains formidably difficult in general, the last few years 
have seen the first successful attempts at providing completely automated de- 
bugging in some situations. This has been achieved with the combination of 
several techniques developed independently: automated testing to detect errors, 
fault localization to locate instructions responsible for the errors, and dynamic 
analysis to choose suitable corrections among those applicable to the faulty in- 
structions. Consider, for example, a routine (method) which removes the last 
element in a linked list by getting a reference and deallocating it. Random 
testing tries the routine on an empty list and exposes an error; fault localiza- 
tion suggests that the problem is deallocating the last element when it is void 
(null); dynamic analysis suggests to change the behavior of the routine so that 
deallocation is performed only when the last element exists. 

A few premises make such automated debugging techniques work in prac- 
tice. First, the majority of errors in programs admit simple fixes consisting 
in adding or modifying one or two instructions; correspondingly, generating the 
set of possible "small" corrections exhaustively is often computationally feasible. 
Second, the availability of contracts (pre and postconditions, class invariants) 



can dramatically improve the accuracy of both error detection and fault local- 
ization. 

Our previous work in this area [21] takes advantage of these observations 
to perform an analysis of faults in object-oriented programs with contracts and 
correct them. The analysis constructs an abstract model of correct and incor- 
rect executions, which summarizes the information about the program state at 
various locations in terms of state invariants. The invariants express the values 
returned by public queries (functions) of a class — the same functions used by 
developers in the contracts that document the implementation. The comparison 
of the invariants characterizing correct and incorrect runs suggests how to fix 
errors: whenever the state signals the "incorrect invariant" , execute actions to 
avoid triggering the error. A behavioral model of the class, also relying on state 
invariants, suggests the applicable "recovery" actions. We call this approach 
to automated program fixing model-based, given that a model, based on state 
invariants, abstracts the correct and incorrect visible behavior. In the exam- 
ple of the linked list, assume that the class has a query empty, which returns 
true when the list contains no elements, and that the correct and incorrect runs 
respectively have invariants not empty and empty, because the failure occurs 
precisely when the list is empty. A reasonable fix consists in adding a condi- 
tional statement which guards the deallocation instruction and executes it only 
when not empty is the case. 

The efficacy of model-based fixing fundamentally depends on the quality of 
the public interfaces, because invariants are mostly based on public queries. The 
present paper introduces a more general approach to automated fixing which 
works successfully even for classes with few public queries. The approach is still 
based on the dynamic analysis of correct and incorrect runs. However, rather 
than merely monitoring the value of queries, the analysis proactively gathers 
evidence in terms of values taken by expressions appearing in the program 
text. An algorithm built upon fault localization techniques — based on static 
and dynamic analysis — ranks expressions and their values according to their 
likelihood of being indicative of error. The expressions ranking highest are 
prime candidates to guide the generation of fixes: when an expression takes 
a "suspicious" value, execute actions that change the value to "unsuspicious". 
We call this novel approach code-based to designate the white-box search for 
information denoting faults in the program text. In the sketched example of 
the linked list, code-based techniques can build a fix even if a query empty is 
not available, by choosing to monitor the value of the expression denoting the 
reference to the last element in the list. 

The designations "model-based" and "code-based" schematize the essential 
differences between the two approaches, but it is important to remark that 
the latter is essentially an extension (and improvement) of the former: code- 
based techniques also exploit information in the form of state invariants and 
public queries to reproduce the results of model-based techniques when these 
are successful. 

We implemented code-based fixing in the tool AutoFix-E2, successor to Au- 
toFix-E |21| which implemented model-based techniques. The experiments in 
Section |4] demonstrate that code-based techniques can automatically fix more 
errors than model-based approaches, even beyond data structure implementa- 
tions — the natural target of model-based and random-testing techniques, for 
their rich public interfaces. 
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The paper is organized as follows: Section [2] introduces code-based tech- 
niques with an example of fix which is beyond the capabilities of model-based 
techniques; Section [3] details the ingredients of code-based fixing and how they 
are combined; Section |4] presents an experimental evaluation of the implemen- 
tation AutoFix-E2; Section [5] discusses related work; Section |6] outlines future 
work. 

2 Automated Fixing: an example 

This section illustrates two faults fixed by AutoFix-E2; the example shows the 
edge of code-based techniques in specific fixing scenarios and is used throughout 
the paper. 

2.1 Two Errors in a Routine 

The EiffelBase class T WO ^WAY^SORTED^SET implements a set data structure 
with a doubly-linked list. An internal cursor index (an integer attribute) is useful 
to navigate the content of the set: the actual elements occupy positions 1 to 
count (another integer attribute, storing the total number of elements in the 
set), whereas the indexes and count + 1 correspond to the positions before 
the first element and after the last. Listing [l] shows the routine move-item of 
this class, which takes an argument v of generic type G that must be a reference 
to an element already stored in the set; the routine then moves v from its current 
(unique) position in the set to the immediate left of the internal cursor index. For 
example, if the set is {a,b,c,v) and index is 2 upon invocation, moveJndex [v) 
changes the set to {a,v,b,c). The routine's precondition (require) formalizes 
the constraint on the input. After saving the cursor position as the local variable 
idx, the loop in lines 7-10 performs a linear search for the element v using the 
internal cursor: when the loop terminates, index denotes w's position in the set. 
The three routine calls on lines 12-14 complete the work: remove takes v out of 
the set; go_i_th restores index to its original value idx; putJeft puts v back in 
the set to the left of the position index. 

AutoTest [17] reveals, completely automatically, two errors in this imple- 
mentation of movc-item. The first error is due to the fact that calling remove 
decrements the count of elements in the set by one. AutoTest produces a test 
(shown in Figure [T]) that calls moveAtem when index equals count + 1; after v 
is removed, this value is not a valid position because it exceeds the new value of 
count by two, while a valid cursor ranges between and count + 1. The test vi- 
olates go-i-th^s precondition (line 17), which enforces the consistency constraint 
on index, when invoking it on line 13. 

The second error occurs when index has value 0, denoted by the boolean 
query before (line 19); this is a valid position for go-i-th but not for putJeft , 
because there is no position "to the left of 0" where v can be re-inserted: the 
call to putJeft on line 14 violates its precondition (line 18). 

2.2 Code-Based Fixing at Work 

The fault revealed in the invocation of go_i_th is actually a special case of a more 
general error which occurs whenever v appears in the set in a position to the left 
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Listing 1: Routines of TWO^WAY^SORTED^SET. 

moveAtem {v: G) 

Move 'v' to the left of cursor. 

require v 7^ Void ; has (v) 

local idx: INTEGER ; found: BOOLEAN 

do 

idx := index 

from start until found or after loop 
found := {1; = item) 
if not found then forth end 

end 

check found and not after end 
remove 
goA^h (idx) 
putjeft (v) 
end 

goJAh (i: INTEGER) require 0<i<count + 1 

putjeft (v: G) require not before 

before: BOOLEAN do Result := {index = 0) end 



count - 1 count count + 1 



State of the set before calling remove (forv) ^ 

1 2 count count + 1 count + 2 
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State of the set after calling remove (for v) 
Invalid element ■ i boundary position ■ '; invalid element -Rvalue of 



Figure 1: Calling remove in move-item when index = count + 1 holds initially 
makes the following invocation of goAJh ( idx) violate a precondition. 

of the initial value of index:, even if index < count initially, putJeft will insert v 
in the wrong position as a result of remove decrementing count — which indirectly 
shifts the index of every element after index to the left by one. For example, if 
index is 3 initially, (a, u, b, c) becomes (a, h, v, c), instead of staying unchanged, 
after calling movc-item {v). Such states leading to erroneous behavior go unde- 
tected by AutoTest because the developers of TWO^WAY_SORTED_SET Y>m- 
vided an incomplete postcondition; more generally, the class lacks a query to 
characterize the fault condition in general terms. Nonetheless, AutoFix-E2 can 
completely correct the error, beyond the specific case reported by the failed test: 
it builds the expression idx > index to characterize the error state and generates 
the corresponding fix, introduced before line 13, which re-scales idx to reflect 
the fact that the object in position idx has been shifted left. 

if idx > index then idx idx — 1 end 

The error in the invocation of putJeft , on the other hand, is accurately 
characterized by the public query before, which returns True whenever the call 
on line 14 triggers a precondition violation. The correction suggested automati- 
cally by AutoFix-E2 adds the instruction if before then forth end right before 
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Figure 2: How code-based fixing works. Run AutoTest to automatically gen- 
erate passing and failing test cases for the input Eiffel classes (Section 
extract a set of expressions from the text of the classes' routines (Section 
compute the expression dependence score edep between expressions — measuring 
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their syntactic similarity (Section 3.3.2 ) — and the control dependence score cdep 
between program locations — measuring how close they are on the control-flow 
graph (Section 3.3.1); compute the dynamic score dyn between expressions, 



measuring how more often an expression is mentioned in failing rather than 



passing test cases (Section 3.4); combine the three scores (edep, cdep, dyn) into 



the score fixme, which determines a global ranking of expressions (Section 3.51; 
enumerate possible fixing actions for the expressions with highest fixme score 



(Section 3.6 ); generate candidate fixes by injecting the fix actions into the faulty 
routine (Section |3.7[); the candidate fixes that pass all the regression test suite 



are considered valid (Section 3.8 ) 



line 14: forth moves the cursor to the first position, which is valid for putJeft . 



2.3 Model-Based Fixing at Work 

How do model-based techniques, implemented in AutoFix-E, perform on the two 
errors shown? The error in the invocation of putJeft has a characterization in 
terms of public queries and state invariants, hence AutoFix-E also produces a 
correct fix, equivalent to the one from AutoFix-E2. 

Model-based techniques, however, can correct the other error, in the invo- 
cation of go_i_th, only for the specific instance exposed by the test case where 
index = count + 1, that is when after holds. Based on this, a possible partial 
fix consists in adding if after then back end as first instruction on line 5. This 
fix is not only partial but also unlikely to be generated in practice, because it 
modifies code which is several instructions away from where the contract vio- 
lation occurs, but AutoFix-E's heuristics favor fixes that are local to restrict 
the search space. As shown above, code-based techniques do not suffer these 
limitations. 



3 Code-Based Fixing 

This section describes how code-based fixing works; Figure [2] depicts the main 
steps of the process. All the running examples refer to Listing [T] 

Code-based fixing works on Eiffel classes equipped with contracts [H]: pre- 
conditions, postconditions, and class invariants. Each contract element consists 
of one or more clauses; for example, move-item^s precondition on line 13 has 
two clauses: v 7^ Void and has{v). The contracts of a class constitute its ex- 
ecutable specification, hence provide a way to determine functional errors in 
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the implementation: a routine called in a state not satisfying its precondition, 
terminating in a state not satisfying its postcondition or violating the class 
invariant, or reaching an intermediate assertion not satisfied. 



3.1 Test-Case Generation 

Every session of code-based fixing starts by collecting information about the 
runtime behavior of the routine under fix. The raw form of such information 
is a collection of test cases, each a sequence of object creations and routine 
invocations on the objects. A test case is passing if it does not violate any 
contract and failing otherwise. Two failing test cases correspond to the same 
error if they violate the same contract clause at the same program location; this 
assumption is reasonable, given that different clauses of the same contract are 
usually orthogonal. 

Code-based fixing takes a set T of test cases as input, and uses them for 



dynamic analysis (Section 3.4 1 and fix validation (Section 3.8). P and F re- 
spectively denote the sets of all passing and failing test cases in T. F^''^ denotes 
a set of failing test cases violating the clause c at program location j. For ex- 
ample, the set of test cases violating putJeft 's precondition in move-item is 
denoted by F^'^'^^*^^ before) ^ Each session of code-based fixing targets a single 
fault. 

The rest of the code-based fixing process is independent of whether the 
test cases T are generated automatically or written manually. AutoFix-E2 uses 
the random testing framework AutoTest [T7] developed in previous work of 
ours. The use of AutoTest makes the fixing process in AutoFix-E2 completely 
automatic. The experiments described in Section |4] demonstrate that the test 
cases generated by AutoTest are suitable inputs to AutoFix-E2 and support the 
generation of effective fixes without any human intervention. 



3.2 Predicates, Expressions, and States 

Evidence takes the form of boolean predicates, built by combining expressions 
extracted from the program text and the violated contract clause. The evalu- 
ation of a predicate at a program location gives a component of the program 
state at that location. Sections |3.3| and |3.4| rank components according to their 
"suspiciousness" of being responsible for the occurrence of an error. 



3.2.1 Expressions 

For a routine r and a violated assertion clause c, Er,c denotes the set of non- 
constant expressions (of any type) which appear in r's body or in c. For example, 
^before index >1 ^'^^ routine before is {Result, index, index — 0, index >1}. 
Er.c extends the set E,. ,, of expressions by unfolding |18| : E^^c includes all ele- 
ments in Er_c and, for every e € Er^c of reference type t and for every argument- 
less query q applicable to objects of type t, E^.c also includes the expression e. q. 
Continuing the example, E ^g^^^g ^^^g^. > ^ = £^6e/ore,i«dea; > 1 because all the 
expressions in E^^j^^^ index >1 ^^'^ '-'^ primitive type (integer or boolean). 
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3.2.2 Predicates 



The set Pr.c of boolean predicates generated for r and c contains the following 
elements: 

Boolean expressions: b, for boolean b e E^^c of boolean type; 

Voidness checks: e = Void, for every e G E^^c of reference type; 

Integer comparisons: e ~ e', for every e € E,.^c of integer type, every e' S 
lEr.c \ {e} U {0} also of integer type, and every ~ in <, <}; 

Complements: -^p, for every p G Pr,c- 

For example, ^ ij^f ore index > \ contains Result, not Result, index^O and 
index^l {^e {=, 7^, <, <=, >, >=}). 



3.2.3 State Components 

A test case t E T describes a sequence loc(t) = ^1,^2, •• • of executed program 
locations. For an expression e and a location £ E loc(i), |e]f is the value of e at 
£ in t, if e can be evaluated at £. 

The evaluation of predicate p at location £ defines the triple {£, p, v) , where 
V is the value for some test case t which reaches £\ a test case t may define 
multiple triples with the same £, if £ appears more than once in loc(i). comp(r) 
denotes all the triples v) defined by the tests in the set T; they are the com- 
ponents of the program state during the tests. In the running example, every test 
case reaching location 6 defines (6, v — Void, False) — because the precondition 
guarantees v 7^ Void — but does not define any triple (6, Result, — because 
Result is not a variable in the scope of move-item. 



Sections 3.3 3.5 show how to rank components according to heuristics which 
take into account static and dynamic measures. The ranking heuristic fixme 
summarizes various sources of evidence; a triple {£,p,v) appearing high in the 
ranking indicates that an error is likely to have its origin at location £ when 
predicate p evaluates to v. Correspondingly, the fixes generated automatically 



try to change the value of p at £ whenever it is v (Section 3.6 ) 



3.3 Static Analysis 

Static analysis extracts evidence from the program text independently of the 
runtime behavior: control dependence measures the distance, in terms of num- 
ber of instructions, between two program locations; expression dependence mea- 
sures the syntactic similarity between two predicates. We use control depen- 
dence to estimate the proximity of a location to where a failure is triggered; 
then, we further differentiate among expressions evaluated at nearby program 
locations according to a simple syntactic measure of similarity between each 
expression and the violated contract clause. Such a lightweight static analy- 
sis is sufficient for code-based fixing to work, given that the primary source of 



evidence comes from dynamic analysis (Section 3.4) anyway. 
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3.3.1 Control Dependence 



For two program locations €i , -^2 , write £i £2 if ii and £2 belong to the same 
routine and there exists a directed path from £1 to £2 on the control-flow graph 
of the routine's body; otherwise, £1 7A £2. The control distance cdist(^i, £2) 
of two program locations is the length of the shortest directed path from £1 
to £2 on the control- flow graph if £1 £2, and 00 if ^1 £2- For example, 
cdist(8, 12) = 4 in Listing^ 

Correspondingly, the control dependence cdep(^, j) is the normalized score: 

, //, N -, cdistf^, 1) 

cdep£,j = 1 'f — ^ 

max{cdist(A, j) \ X £ r and A j| 

for £ J, and for £ j- We use control dependence to rank locations 
according to proximity to the location of failure. 

3.3.2 Expression Dependence 

For an expression e, define the set sub(e) of its sub-expressions as follows: 

• e e sub(e); 

• if e' e sub(e) is a query call of the form t.q (ai, . . . , am) for m > 0, then 
t e sub(e) and a.; G sub(e) for all 1 < i < m. 

This definition also accommodates infix operators (such as boolean connectives 
and arithmetic operators), which are just syntactic sugar for query calls; for 
example a and b are both sub-expressions of a -I- 6, a shorthand for a. plus (b) . 
Also, unqualified query calls are treated as qualified call on the implicit target 
Current. 

The expression proximity eprox(ei,e2) of two expressions 61,62 measures 
how similar ei and 62 are in terms of shared sub-expressions: eprox(6i,62) — 
|sub(6i) n sub(62)| For example, eprox( j < count, < i < count + 1) is 2, corre- 
sponding to the shared sub-expressions i and count. The larger the expression 
proximity between two expressions is, the more similar they are. 

Correspondingly, the expression dependence edep(p, c) is the normalized score 
measuring the amount of evidence that p and c are syntactically similar: 

eprox(p,c) 

edep(p, c) = 

max{eprox(7r, e) | tt € Pr,c} 

In routine before, for example, edep(mdea;, index = 0) is 1/3 because 
edep(mdea;, index = 0) = 1 and index = itself has the maximum expression 
proximity to index — 0. We use expression dependence to rank expressions 
according to similarity to the contract violated by a failure. Expression depen- 
dence is meaningful only for expressions evaluated in the same local environment 
(that is, with strong control dependence), where the same syntax is likely to refer 
to identical program elements. 



3.4 Dynamic Analysis 

Dynamic analysis extracts evidence from test cases in the form of score asso- 
ciated to every state component generated. The higher the score dyn(£,p, u) a 
component {£,p, v) receives, the stronger the runtime behavior suggests that an 
error originates at location £ when predicate p evaluates to v. 
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3.4.1 Principles to Compute the Score 



Consider an error violating the contract clause c at location ] in some routine 
r. Let Pr be a set of passing test cases exercising routine r, F^'^ a set of failing 
test cases exposing the same error, and T}''^ the union P^. U F-j. '^. comp(T^'^) is 
then the set of components describing correct and incorrect behavior of r. 

For a test case t G T^''^ and a component v) such that £ is a location in 
r's body, write (£,p, v) ^ t ii t reaches location £ at least once and p evaluates 
to V there: 

(£,p, v) (^t iff 3li e \oc{t)J. = e,, and V = {pjl' 

For every test case t G T-!''^ such that {£,p,v) £ t, (j{t) denotes its contribu- 
tion to the score of (£,p, v): a large a{t) denotes evidence that (£,p, v) is a likely 
"source" of error if t is a failing test case, and evidence against it if t is passing. 



Section 3.4.2 builds a function a according to the following principles: 



(a) If there is at least one failing test case t such that {£,p,v) G t, the overall 
score assigned to {£,p,v) must be positive: the evidence provided by failing 
test cases cannot be canceled out completely. 

(b) The magnitude of each failing (resp. passing) test case's contribution a{t) 
to the score assigned to {£,p, v) decreases as more faihng (resp. passing) test 
cases for that component are available: the evidence provided by the first 
few test cases is crucial, while repeated outcomes carry a lower weight. 

(c) The evidence provided by one failing test case is stronger than the evidence 
provided by one passing test case. 

The first two principles are after Wong et al.'s "Heuristic III" |23], which ex- 
periments by the same authors showed to yield better fault localization accuracy 
than most alternative approaches. According to these principles, components 
appearing only in failing test cases are more likely to be fault causes. 

Our dynamic analysis assigns scores according to the same basic principles 
as Wong et al.'s, but with differences suggested by the ultimate goal of auto- 
matic fixing: our score ranks state components rather than program locations, 
and assigns weight to test cases differently. Contracts significantly help find 
the location responsible for a fault: in many cases, it is proximate to where 
the contract violation occurred; on the other hand, automatic fixing requires to 
gather information not only about the location but also about the state "respon- 
sible" for the fault. This observation prompted us to apply the fault localization 
principles to state components. 



3.4.2 Score from Dynamic Analysis 

Assume an arbitrary order on the test cases and let a{t) be a' for the i-th failing 
test case t and /3a* for the i-th passing test case. Selecting < a < 1 decreases 
the contribution of each test case exponentially, which meets principle (|b| ; then, 
selecting < /3 < 1 fulfills principle Q. 

The evidence provided by each test case adds up: 

dyn(£,p, z;) = 7 + ^ {a(^.) I e - ^ {a(i;) I « e 
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for some 7 > 0; the chosen ordering is immateriaL We compute the score with 
the closed form of geometric progressions: 

#p(£,p,«) = \{t ePr I {i,p,v)et}\ 
#Hi,P,v) = \{teF^'- 1 {e,p,v)et}\ 

dyn(^,p, = 7 + f 1 - /3 + /Ja^P^^'P'''^ - a*^''^'P'^A 

1 — a \ I 

where #p(^,p, v) and #f(£,p, v) are the nmnber of passing and failing test cases 
that determine the component t;). It is straightforward to prove that 
dyn(£,p, u) is positive if #f(£,p, w) > 1, for every < a,/? < 1, hence the 
score meets principle Q as well. Some empirical evaluation suggested to set 
a = 1/3, fi = 2/3, and 7 = 1 in the current implementation of AutoFix-E2. 



3.5 Combining Static and Dynamic Analysis 

The final output of the analysis phase combines static and dynamic analysis to 
assign a "suspiciousness" score fixme(^,p, w) to every state component w). 

Expression dependence and control dependence are both ratios, and the score 
from dynamic analysis is essentially a sum of fractional values. This suggests [3] 
to combine the three scores by harmonic mean: 

3 

fixme(-^,p, -y) = — j- 

edep(p, c)^i + cdep(^, + dyn{£,p, v) 

The current choice of parameters a, /3, 7 makes the dynamic score dyn(£,p, v) 
dominant in determining the overall score fixme(£,p, u): while expression and 
control dependence vary between and 1, the dynamic score has minimum 2/3 
(for zero failing test cases and indefinitely many passing) and maximum 3/2 (for 
zero passing test cases and indefinitely many failing). This range difference is 
consistent with the principle that dynamic analysis gives the primary source of 
evidence, whereas the less precise evidence provided by static analysis is useful 
to discriminate among components with similar dynamic behavior. 



3.6 Fixing Actions 

Consider a component {£,p, v) with a high evidence score fixme(^,p, v). {£,p, v) 
induces a number of possible actions (instructions) which try to avoid using the 
value w of p at i. The actions may either modify p directly (Section [3.6.2 ) or 
change the usage of p in the instruction at £ (Section 3.6.3). 



3.6.1 Derived Expressions 

Expressions of boolean and integer type are modified according to standard 
patterns which may reverse common sources of mistakes — such as "off-by-one" 
errors. For an expression e, the set ederiv(e) includes: 

• if e is of boolean type, the constants True and False, and the expression 
not e; 

• if e is of integer type, the constants 0, 1, —1, and the expressions e + 1 and 
e- 1. 
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3.6.2 Expression Modification 

One way to change a state component is to directly modify the expression of 
that component. An expression e is modifiable at £ if: e is of reference type; or 
e is of integer type and the assignment e := can be executed at £; or e is of 
boolean type and the assignment e := True can be executed at £. For example, 
index is modifiable everywhere in routine move-item because it is an attribute 
of the enclosing class; in routine goAAh, instead, i is not modifiable anywhere 
because arguments are read-only in Eiffel. 

Since an expression in a state component may not be directly modifiable, we 
also consider sub-expressions. The definition of sub-expression (Section |3.3.2 ) 
induces a partial order ^: ei ^ 62 iff ei £ sub(e2); correspondingly, it de- 
fines the largest expressions in a set. For example, the largest expressions of 
integer type in sub{idx <index or after) are idx and index. A pair {£,p) de- 
termines the set targ{£,p) of target expressions: targ(^,p) includes the largest 
expressions among p's sub-expressions sub(p) that arc modifiable at £. For ex- 
ample, targ(13, idx > Current. index) on Listing [l] includes the integer expres- 
sions Current. mdea; and idx, but no reference (Current is a sub-expression of 
Current. mdea;) or boolean expression {idx > Current. inrfea; is not modifiable 
according to the definition). 

Finally, populate the set emod{£,p) of expression modifications induced by 
the component {£,p,v) as follows: 

• for e € targ{£,p) of boolean or integer type and every derived expression 
d S ederiv(e), include e := d in emod{£,p); 

• for e G targ{£,p) of reference type, if e.c (ai, . . . , a„) is a call to a command 
(procedure) c executable at £, include e.c(ai, . . . ,a„) in emod(^,p). 

In the running example, emod(13, idx > Current. mrfea;) includes assignments 
of 0, 1 and —1 to idx and index, and unit increments and decrements of the 
same variables. 

3.6.3 Expression Replacement 

There are cases where expression modification is infeasible or undesirable. For 
example, expression i in routine goA^th does not have any modifiable sub- 
expression. In such situations, expression replacement directly substitutes the 
usage of expressions in instructions. 

Every location I labels either a primitive instruction (an assignment or a 
routine call) or a boolean condition (the branching condition of an if instruction 
or the exit condition of a loop). Correspondingly, define the set sub(£) of sub- 
expressions of a location £ as follows: 

• if £ labels a boolean condition b then sub{£) — sub(6); 

• a £ labels an assignment u := e then sub(^) = sub(e); 

• if £ labels a routine call t.c (oi, . . . , a„) then sub(^) — lj{sub(ai) | 1 < i < 
n}. 

A pair {£,p) determines the set erepl(^,p) of instructions with replaced ex- 
pressions as follows: for each expression e among the largest sub-expressions 
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of boolean or integer type in sub(p), if e G sub(£) then include i\e ^ e'] in 
erepl(£,p), for every e' G ederiv(e). i\e i—^ e'] denotes the instruction obtained 
by replacing every occurrence of e at location i with e'; if ^ labels a boolean 
condition, l\e M> e'] denotes the whole instruction (conditional or loop) but e' 
replaces e only in the boolean condition. 

In the continued example, erepl(13, idx > index) includes go-iAh {idx — 1), 
goAJh [idx + 1), go_i_th (0), go_i_th (1), and go_i_th (—1); 
erep\{13, idx + 1 >index), however, is empty because the two largest integer 
sub-expressions in idx + 1 > index are idx + 1 and index, none of which is a sub- 
expression of idx in go-i-th (idx). In the same routine, erepl(9,/oMn(i) includes 
the conditional instructions if not(not found) then forth end, if True then 
forth end, and if False then forth end. 

3.7 Fix Candidate Generation 

At this point, for any "suspicious" state component w) we can generate 
actions that change the value (3.6.21 or the usage ( |3.6.3 ) of p at i. Each such 



action generates a candidate fix if injected at location £. The injection consists 



an action derived from p (3.7.2) 



of first selecting a fix schema (3.7.1 ), then instantiating the schema with p and 



3.7.1 Fix Schemas 

We use the same fix schemas used for model-based fixing [21] shown in Table [l] 



(a) (6) (c) (d) 

snippet if fail then if not fail then if fail then 

old-stmt snippet oldstmt snippet 

end end else 

old_stmt old_stmt 

end 



Table 1: Fix schemas. 



3.7.2 Schema Instantiation 

For a state component v) determined by the passing test cases of routine 
r and the failing test cases F}''^ violating the contract clause c at location j, 
instantiate each of the schemas in Tabled] as follows: 

fail takes p ~ v, the component's predicate and value. 



snippet takes any value in emod(£,p) U erepl(^,p) (defined in Sections 3.6.2 



3.6.3). 



old^stmt is the instruction at location £. 

The instantiated schema replaces the instruction at position i in routine r; the 
modified routine is a candidate fix. 
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In the running example, the component (13, idx > index, True) leads to sev- 
eral candidate fixes. Schema (b) with the component's predicate idx > index as 
fail, the expression modification idx :— idx — 1 as snippet, and the original 
instruction go-i-th (idx) as oldstmt produces a correct fix. A different com- 
bination, which also produces a correct fix, is schema (d) with fail using the 
component's predicate idx > index, the instruction with the expression replace- 
ment go_i_th {idx — 1) as snippet, and the original instruction go_i_th (idx) 
as old^stmt. 

3.8 Validation of Candidates 

The generation of candidate fixes involves the application of several heuristics 
and is essentially "best effort" : there is no a priori guarantee that the candidates 
actually fix the program. Each candidate fix must pass a validation phase 
which determines whether its deployment removes the erroneous behavior under 
consideration. The validation phase runs each candidate fix through the full set 
of passing and failing test cases. A fix is validated if it passes all the previously 
failing test cases F^''^ and it still passes the original passing test cases Pr- In 
general, more than one candidate fix may pass the validation phase; AutoFix-E2 
ranks all valid fixes according to the score of the state component that originated 
the fix and submits the top 15 to the user, who is ultimately responsible to decide 
whether to deploy any of them. 

The correctness of a program is defined relative to its specification; in the 
case of automated program fixing, the validated fixes are only as good as the 
contracts. For example, routine move-item lacks a postcondition, therefore the 
simple candidate fix which unconditionally adds the assignment idx := 1 before 
the call to goJ^th is validated despite being obviously inappropriate. In spite of 
these limitations in principle, the experiments in Section [4] show that the avail- 
able contracts are often good enough in practice, so that AutoFix-E2 suggests 
proper fixes — correct not only according to the contracts available but also to 
the intuitive expectations of developers — in the large majority of cases where it 
can validate some fixes. Improving the quality of the contracts is a related effort 
which can also greatly benefit from automation [20] and whose results boost the 
effectiveness of automated program fixing. 

4 Experimental evaluation 
4.1 Experimental Setup 

All the experiments ran on a Windows 7 machine with a 2.66 GHz Intel dual- 
core CPU and 4 GB of memory. On average, AutoFix-E2 ran for 7.6 minutes 
for each fault. 

4.1.1 Selection of Faults 

The experiments include faults from two sources: data structure classes from 
commercial libraries, and an implementation of a library to manipulate text 
documents developed as student project. 

Data structure libraries. Table [2] lists the 15 classes from the Eiffel- 
Base [7] (rev. 507) and Gobo [9] (rev. 79072) libraries used in the experiments; 
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the table reports the length in lines of code (LOG), the total number of routines 
(#R) and boolean queries (#B) of each class, and the number of faults (#F) 
considered in the experiments. This selection of faults combines 13 faults used 
in the evaluation of model-based fixing [21] with 51 new faults recently found 
by AutoTest. We did not re-use the remaining 29 faults used in |3T] because 
they are not reproducible in the latest revision of the libraries. 

Table 2: EiffelBase and Gobo classes. 

Glass LOG #R #B #F 



ACTIVE.LIST 


2162 


139 


19 


2 


ARRAY 


1464 


101 


11 


9 


ARRAYED_CIRCULAR 


1910 


133 


25 


3 


ARRAYED _SET 


2345 


146 


18 


5 


DS-ARRA YED^LIST 


2762 


166 


9 


5 


DS-HASH^SET 


3076 


169 


10 


1 


DS.LINKED_LIST 


3434 


160 


8 


5 


HASH_TABLE 


2036 


118 


19 


2 


INTEGER_32 


1115 


99 


5 


1 


LINKED^LIST 


2000 


109 


16 


1 


LINKED^PRIORITY^QUEUE 


2374 


125 


17 


1 


LINKED.SET 


2352 


122 


16 


5 


REAL.64 


839 


72 


4 


1 


S UBSET_STRA TEG Y_HA SHA BLE 


543 


33 





4 


TWO. WAY_SORTED_SET 


2868 


141 


18 


19 


Total 


31280 


1833 


195 


64 



A library to manipulate text documents. The second part of the 
evaluation targets a library to manipulate text documents and convert them 
into HTML and WT^. The library models entities such as formatted text, 
lists, tables, and images; it has been implemented as a student project of the 
Software Architecture course held in the spring semester 2010 at ETH Zurich. 
Table [3] lists the 3 classes of the library used in the experiments, with the same 
statistics as in Table [2j Gompared to EiffelBase and Gobo, the text document 
library's classes have a more primitive interface, with very few boolean queries 
(31 of the 32 boolean queries of class FILE_NAME are inherited from the library 
class STRING, hence they are mostly unrelated to the specific semantics of 
FILE^NAME) and less detailed contracts; therefore, they are representative of 
less mature software with functionalities complex to specify formally. AutoTest 
detected 9 faults (#F) in the classes: 5 precondition violations, 3 intermediate 
assertion violations, and 1 call on void target (null pointer dereference). 

Table 3: Document manipulation library classes. 



Glass 


LOG 


#R 


#B 


#F 


FILE^NAME 


4297 


258 


32 


2 


HTML_ TRA NSLA TOR 


1148 


83 





1 


LA TEX. TRA NSLA TOR 


1269 


90 





6 


Total 


6714 


431 


32 


9 
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4.1.2 Selection of Test Cases 

All the experiments used test cases generated automatically by AutoTest; this 
demonstrates complete automation of the whole debugging process and mini- 
mizes the potential bias introduced by experimenters. AutoTest produced an 
average of 25 passing and 11 failing test cases for each fault. 

4.2 Experimental Results 
4.2.1 Data Structure Libraries 

Table |4] summarizes the results of the experiments on the data structure li- 
braries: the number #F of faults in each category, the faults fixed with model- 
based techniques using AutoFix-E, and those fixed with code-based techniques 
using AutoFix-E2. The count of valid fixes only includes those which are proper, 
that is which manual inspection confirmed to be adequate beyond the correct- 
ness criterion provided by the contracts and tests available. The faults fixed 
by AutoFix-E2 are a superset of those fixed by AutoFix-E; when both tools 
succeeded, they produced equivalent fixes (with possibly negligible syntactic 
differences). We refrained from injecting more bugs in EiffelBase and Gobo — as 
it is customary in evaluating fault localization techniques — in order to have an 
evaluation that only deals with real bugs found in production software. 

Of the 50 faults not fixed, about 25 expose design errors, rather than mere 
programming errors: for example, several of the faults point to inconsistencies 
in the inheritance hierarchy of the library. Another 19 faults originate from 
incorrect or incomplete contracts, such as weak class invariants that let objects 
reach inconsistent states. The remaining 6 faults are of various type, including 
some non-functional properties. To our knowledge, automatically fixing most of 
these "deep" errors is beyond the capabilities of any existing automatic program 
fixing technique. 



Table 4: Faults fixed in EiffelBase and Gobo classes. 



Type of fault 


# F 


Model 


Code 


Precondition violation 


22 


10 (45%) 


12 (54%) 


Postcondition violation 


30 


(0%) 


2 (6%) 


Call on void target 


7 


(0%) 


(0%) 


Intermediate assertion violation 


5 


(0%) 


(0%) 


Total 


64 


10(15%) 


14(22%) 



The results show that code-based techniques constitute a significant improve- 
ment over model-based techniques. Even if model-based techniques perform 
already quite satisfactorily on data structure implementations, due to the high 
quality of the queries available in their interfaces, code-based fixing succeeded 
with 4 more errors (40% improvement). Most of the errors where code-based 
fixing succeeds and model-based techniques fail are indicative of subtle bugs 
with non-obvious fixes. Three are precondition violations: one is described in 
Section [2] the other two— from class DS.HASH_SET and HASH.TABLE—aie 
similar in that the fix requires to reference a local variable rather than public 
queries. The other fault is a postcondition violation, which model-based tech- 
niques cannot handle as it requires a fix in a location different than where the 
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Listing 2: Routine visit^table fixed by AutoFix-E2. 

1 visitAable ( arable : TABLE) 

2 local s: STRING ; i: INTEGER 

3 do 

4 packages. extend ("tabulary") 
6 create s.make_empty 

6 if aAable . count > then Added by AutoFix-E2 

7 from j := 1 

8 until i > arable . column^count loop 

9 Append to 's' the table's content, 

10 in form 

11 end 

12 end Added by AutoFix-E2 

13 

14, s . preperwi (" I " ) ; s. append {" |") 

15 open_enmronmeni ("tabulary" , s) 

16 

17 end 



violation occurs (i.e., at the end of the routine's body). 

4.2.2 Text Document Manipulation Library 

The second set of experiments tried to determine if code-based techniques can 
successfully tackle software beyond well-engineered data structure implementa- 
tions. AutoFix-E2 built valid fixes for 5 of the 9 faults in the document library: 
one in each of the classes FILE^NAME and HTML^TRANSLATOR, and 3 in 
the class LATEX^TRANSLATOR. In comparison, AutoFix-E only fixed one of 
the faults, which AutoFix-E2 also fixed; manual inspection confirmed the ex- 
pectation that model-based fixing fails whenever the fault conditions cannot be 
characterized using only boolean queries — the case for nearly all the errors in 
the text document library. 

As an example from these experiments. Listing [2] shows the essential parts 
of a routine visit-table fixed by AutoFix-E2. visit-table converts data in 
table form, passed as argument a_table, into I^Tg^X. To this end, it first opens 
a "tabulary" environment (line 4); then, the loop on lines 7-11 converts the 
content of the various columns into the string s; finally, it adds delimiters to 
the table (line 14), and stores the content of s in the "tabulary" environment 
(line 15). The loop fails when the table is empty, because the query columri-count 
of a-table has a precondition count>0. The fix wraps a conditional statement 
(lines 6-12) around the loop; correspondingly, an empty table becomes an empty 
table as appropriate. This example gives an idea of the kinds of fixes 
generated in the second set of experiments, and how code-based techniques can 
be successful on them. 

4.2.3 Overall Performance of Code-Based Fixing 

In the experiments, code-based techniques fixed 19 errors, 73% more than 

model-based techniques. 
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4.3 Threats to Validity 

Some threats may limit the generaUzabihty of the results: 

• The choice of using only automatically generated test cases may affect the 
performance and quality of the results. On the other hand, if code-based 
fixing works well also with manually written test cases it can be applicable 
to more software. 

• The evaluation uses real software and real errors made by programmers, 
but it could target even more classes of diverse application domains. A 
larger-scale thorough evaluation belongs to future work. 

• Our notion of correctness is relative to the available contracts. Corre- 
spondingly, the quality of the contracts may affect the quality of fixes 
produced, but we do not know to what extent this holds for the classes 
used in the experiments. 

5 Related Work 

This section summarizes the most relevant related work in two areas: fault 
localization and automated program fixing. 

5.1 Fault Localization 

Fault localization is the process of locating erroneous statements in a program. 
Several suggested solutions to this problem use heuristics based on code coverage 
(e.g., dSHH]) or program states (e.g., [HllMj)- 

Code coverage. Code coverage metrics have been used to rank instructions 
based on their likelihood to trigger failures. [13], for example, introduces the 
notion of failure rate: based on a large number of test cases, an instruction 
has a high failure rate if it is executed more often in failing test cases than in 
passing test cases. A block of code is then "suspicious" of being faulty if it 
includes many instructions with high failure rate; [13] suggests to visualize the 
failure rates with colors and brightness, and implements the scheme in the tool 
Tarantula. |19| proposes a fault localization technique named nearest neighbor. 
The nearest neighbor of a given faulty test case is the passing test case in 
a test suite which is most similar to the failing test case. Removing all the 
instructions mentioned in the nearest neighbor from the faulty test produces a 
smaller set of instructions; these are the candidates to be responsible for the 
fault under consideration. Several other authors have extended code coverage 
techniques for fault localization. For example, [25] addresses the propagation 
of infected program states; [TS] relies on a model-based approach; and [23] 
performs an extensive comparison of variants of fault localization techniques 
and outlines general principles behind them. J4j discusses the limitations of 
using only state invariants for fault localization, a limitation present in model- 
based fixing techniques but removed with the code-based approach. 

Program states. The application of code coverage techniques produces a 
set of instructions likely to be responsible for failure; programmers still have to 
examine each instruction to understand where the problem is. Fault localiza- 
tion techniques based on program states aim at alleviating this task. |llj . for 
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example, requires programmers to insert check points in the program to mark 
"points of interest" . Then, a dynamic analysis similar to [13] — but applied to 
program states rather than locations — identifies a set of suspicious states. Such 
a state-based analysis is finer-grained than those based only on code coverage; 
furthermore, the usage of check points introduces more flexibility to skip unin- 
teresting parts of the computation, for example repeated iterations of a loop. 
delta debugging addresses similar issues: isolating the variables, and their 
values, relevant to a failure by analyzing the state difference between passing 
and failing test cases. Most fault localization techniques target each fault indi- 
vidually, hence they perform poorly when multiple bugs interact and must be 
considered together. To address such scenarios, [14] introduces a technique that 
separates the effects of multiple faults and identifies predictors associated with 
each fault. 

Fault localization in code-based fixing. The code-based program fix- 
ing techniques of the present paper also exploit fault localization techniques. 
To generate fixes completely automatically, however, fault localization must be 
sufficiently precise to suggest only a limited number of "suspicious" instructions. 
In our case, the usage of contracts help to restrict the search to the boundaries of 
the routine where a contract violation occurs. Then, the combination of static 
and dynamic analysis techniques that rank state components within routines 
produces fault localization sufficiently accurate for fixing faults automatically. 

5.2 Automated Program Fixing 

This section reviews the most significant contributions to automated fixing of 
source code. The related work section in our previous work j21j also describes 
different approaches working at runtime on the compiled binary. 

[12] presents BugFix, a tool that helps developers fix bugs by suggesting 
patches. Their approach uses machine-learning techniques, which can work 
without annotations such as contracts. BugFix learns existing fixes in the form 
of association rules, and it tries to apply the rules learned to new bugs. Users 
can provide feedback — in the form of new examples of correct fixes or valida- 
tions of suggestions provided by the tool — which ameliorates the quality of the 
suggestions provided over time. 

Other authors apply genetic algorithms to generate fixes automatically. [T] 
uses a co-evolutionary scheme where an initially faulty program and some test 
cases compete to evolve the program into one that satisfies its formal speci- 
fication. [32] describes a technique, based on genetic algorithm, that takes a 
program, a set of successful test cases, and one failing test case. After rounds 
of evolution, the program changes into one that passes all test cases (including 
the failing test case). While [22 's results are significant, as they can patch real 
programs of non-trivial size, the role played by evolutionary techniques is not 
entirely clear: as pointed out also in [2], the experiments of span only a 
limited number of generations (about 10), which suggests that the genetic al- 
gorithm performs only a very limited search in the space of possible solutions. 
Another limitation of resides in its sensitivity to the quality (and size) of the 
provided test suite, an effect which is much less relevant in our approach where 
random testing techniques can generate a suitable test suite automatically. 

jlOj presents a technique that compares two program states at a faulty lo- 
cation in the program. Unlike all other approaches to program fixing to date. 
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[TU] computes program states statically, using weakest precondition reasoning. 
The comparison of the two program states illustrates the source of the error; a 
change to the program that reconciles the two states fixes the bug. Weakest pre- 
condition reasoning allows for a quite detailed characterization of the states, but 
it also requires to start with a strong postcondition (a full functional specifica- 
tion), whereas methods based mostly on dynamic analysis — such as code-based 
fixing — provide approximate yet useful characterization even with very weak 
formal specifications. 

5.3 Our Previous Work 

As part of the AuTOFix joint project between ETH Zurich and Saarland Univer- 
sity, we developed the tools Pachika and AutoFix-E. Pachika 5] automatically 
builds finite-state behavioral models from a set of passing and failing test cases 
of a Java program. Pachika also generates fix candidates by modifying the model 
of failing runs in a way which makes it compatible with the model of passing 
runs. The modifications can insert new transitions or delete existing transitions 
to change the behavior of the failing model; the changes in the model are then 
propagated back to the Java implementation. 

AutoFix-E [5T] implements the first automatic program fixing tool for Eiffel, 
based on model-based techniques. AutoFix-E uses argumentless boolean queries 
to abstract the object space, hence it works best for classes with a detailed in- 
terface. Code-based techniques improve on model-based ones by locating faults 
based on both dynamic and static analysis techniques. 

6 Future Work 

Future work includes the following aspects: 

• All our experiments with automated fixing involve automatically gener- 
ated test cases, but state-of-the-art random testing is not applicable to 
every type of program; for instance, applications involving input through 
files or an interactive graphical interface are arduous to test automatically. 
We plan to experiment our techniques for automated fixing on new types 
of software with manually written test cases. 

• A non-negligible portion of the bugs found in EiffelBase, likely represen- 
tative of much of software written in Eiffel, are due to incorrect contracts 
rather than implementations. We will try to flip over our approach to pro- 
gram fixing and fix contracts when the implementation is correct. This 
effort is tightly related to our other work on contract inference [30 . 

• Applying program fixing techniques to languages without contracts re- 
quires to consider other types of faults to fix, including exceptions and 
I/O errors. We will extend AutoFix-E2 to handle these types of faults. 

• While the majority of bugs can be fixed with a small patch, there exist 
conspicuous errors that require significant changes to the code. We plan to 
apply more ambitious code synthesis techniques to the problem of building 
a fix once the "cause" of a fault is known. 
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• AutoFix-E2 is part of the Eve verification environment [5]. As part of 
the ongoing improvements of Eve, we will ameliorate the user interface of 
AutoFix-E2 and its integration with Eve's other verification aides. 



7 Conclusion 

This paper introduced code-based automated program fixing, a novel approach 
to generate automatically corrections of errors in software equipped with con- 
tracts. Preliminary experiments with the supporting tool AutoFix-E2 demon- 
strate that code-based techniques extend the applicability of automated program 
fixing to more faults in classes beyond well-designed data structure implemen- 
tations. 

Availability. The AutoFix-E2 source code, and all data and results cited in 
this article, are available at: 

http : //se . inf . ethz . ch/research/autof ix/ 
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